heterogeneous sources

Terms from Artificial Intelligence: humans at the heart of algorithms

The glossary is being gradually proof checked, but may have typos and misspellings.

We say that data comes from heterogeneous sources if it orginates in different organisations, or from different kinds of data sources. the data itself may be heterogeneous, with mixes of numeric, symbolic data, but even the same data value may be represented differently, for example different date formats, or in some places names given as a whole (e.g. "Alan Dix") and in others coded in parts (e.g. {given:"Alan",family:"Dix"}. This creates many challenges. One needs to either transform all the data sources into a single standard format, often involving substantial data wrangling, or create some form of mapping between the data formats. Connecting the different data sources is also a challenge, for example, the different name formats may make the same name appear differently, but also the same name in two data sets may refer to different people or things.

Used in Chap. 10: page 134

Used in glossary entries: data wrangling